Skip to content

Fix for Llama4 Maverick performance drop#904

Merged
wpyszka merged 3 commits into
vllm-project:releases/v0.14.1from
jkaniecki:Fix_maverick_0_14_1
Feb 2, 2026
Merged

Fix for Llama4 Maverick performance drop#904
wpyszka merged 3 commits into
vllm-project:releases/v0.14.1from
jkaniecki:Fix_maverick_0_14_1

Conversation

@jkaniecki

@jkaniecki jkaniecki commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

This is a fix for Maverick performance drop - t.compile does not handle functions with methods as inputs, so to avoid recompilations we need to declare a scale function directly.

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Copilot AI review requested due to automatic review settings January 30, 2026 15:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts a performance-degrading patch for Llama4 Maverick models by removing the patch_llama4_get_attn_scale function and its invocation. The patch was causing recompilations that significantly reduced performance.

Changes:

  • Removed the patch_llama4_get_attn_scale function that was modifying attention scale behavior for Llama4 models
  • Removed the call to patch_llama4_get_attn_scale from apply_model_specific_patches

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions

github-actions Bot commented Feb 2, 2026

Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

@wpyszka wpyszka merged commit 26d3799 into vllm-project:releases/v0.14.1 Feb 2, 2026
53 checks passed
@michalkuligowski

Copy link
Copy Markdown
Collaborator

Is this needed on main?

Luca-Calabria added a commit to Luca-Calabria/vllm-gaudi that referenced this pull request Feb 6, 2026
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants